Skip to content

perf: reuse searcher across batch queries in find_evidence.py#145

Open
tarilabs wants to merge 2 commits intoredhat-documentation:mainfrom
tarilabs:tarilabs-20260504-reuseindex
Open

perf: reuse searcher across batch queries in find_evidence.py#145
tarilabs wants to merge 2 commits intoredhat-documentation:mainfrom
tarilabs:tarilabs-20260504-reuseindex

Conversation

@tarilabs
Copy link
Copy Markdown
Contributor

@tarilabs tarilabs commented May 4, 2026

In batch mode, ensure_index() was called per query via retrieve_evidence(), redundantly reloading the SentenceTransformer model, reopening the Milvus DB, and rebuilding the entire BM25 index on every iteration (~2-6s overhead each).

Call ensure_index() once before the loop and reuse the returned searcher for all queries within each batch invocation. This saves ~18-84s in the scope-req-audit step (~10-15 queries) and ~18-174s in the code-evidence step (~10-30 queries with two-pass retrieval).

Single-query mode is unchanged — it still delegates to retrieve_evidence().

Summary by CodeRabbit

  • Improvements
    • Batch retrieval is faster and more efficient for multi-query runs.
    • Path-based filtering now correctly resolves relative paths to the repository root.
    • Search result output shows cleaner, rounded relevance scores for clearer readability.

In batch mode, ensure_index() was called per query via retrieve_evidence(),
redundantly reloading the SentenceTransformer model, reopening the Milvus DB,
and rebuilding the entire BM25 index on every iteration (~2-6s overhead each).

Call ensure_index() once before the loop and reuse the returned searcher for
all queries within each batch invocation. This saves ~18-84s in the
scope-req-audit step (~10-15 queries) and ~18-174s in the code-evidence step
(~10-30 queries with two-pass retrieval).

Single-query mode is unchanged — it still delegates to retrieve_evidence().

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Walkthrough

Batch evidence retrieval was refactored to build and reuse a shared search index via ensure_index, resolve filter_paths relative to the repo root, and format raw search matches with a new _format_result. Single-query mode still uses retrieve_evidence via _run_single.

Changes

Batch Retrieval Refactor

Layer / File(s) Summary
Imports
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py (line 26)
Adds Path import from pathlib for repository-root path resolution.
Utilities / Data Shape
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py (lines 36–75)
Adds _resolve_filter_paths(repo_path, filter_paths) to resolve/normalize filter paths and _format_result(query, filter_paths, repo_path, index_info, results) to convert raw searcher matches into the evidence output structure with per-match fields and rounded vector, bm25, and combined scores.
Module Imports / Indexing
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py (lines 133–144)
Imports ensure_index (in addition to keeping retrieve_evidence) to enable creating/reusing a shared index for batch runs; updates related comments.
Batch Mode Core Change
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py (lines 159–174)
Batch flow now calls ensure_index(args.repo, reindex=args.reindex) once to obtain searcher and index_info, resolves each entry’s filter_paths with _resolve_filter_paths, runs searcher.search(...) for each query, and formats results with _format_result; removes prior per-query reindex-on-first-query logic.
Output Shape / Wiring
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py (lines 172–174)
Maintains the outer results list objects { "query": ..., "filter_paths": ..., "result": ... }, but result now contains the formatted payload from _format_result(...) instead of the previous per-query retrieve_evidence output.
Single-Query Mode
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py (lines 144–158)
Retains _run_single(...) behavior that calls retrieve_evidence for single-query invocations; comments/parse logic updated to align with the new batch interface.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: reuse searcher across batch queries in find_evidence.py' accurately and specifically describes the main performance optimization in the PR—reusing a shared search index across batch queries instead of rebuilding it per query.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No Real People Names In Style References ✅ Passed No instances of real people's names used as style references in code-evidence skill files or documentation were found.
Git Safety Rules ✅ Passed The pull request modifies only the code evidence search script with performance optimizations and no git operations or safety violations.
No Untrusted Mcp Servers ✅ Passed The pull request modifies only a Python utility script for code evidence retrieval with performance optimizations and no MCP server installations or untrusted dependencies.
Skill And Script Conventions ✅ Passed The file find_evidence.py adheres to all Skill and Script Conventions requirements. It uses no plugin: prefixed skill references or old slash-command syntax. Script invocation patterns shown in the docstring correctly use relative paths for co-located script calls. The imports are proper Python library imports from the code-finder package rather than script invocations, and no cross-skill script calls that would require ${CLAUDE_PLUGIN_ROOT} are present in the modified code.
Plugin Registry Consistency ✅ Passed PR modifies only Python implementation code, not plugin.json, marketplace.json, or plugin documentation files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py`:
- Around line 133-134: The import ordering in find_evidence.py is unsorted and
causing lint failure: move the import of
claude_context.skills._index_manager.ensure_index so it appears before
claude_context.skills.evidence_retrieval.retrieve_evidence (i.e., ensure_index
import should precede retrieve_evidence), then re-run the project's import
sorter/formatter (or manually reorder the two import lines) to satisfy I001.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ad13bc82-68d5-4b48-823a-975e819d430b

📥 Commits

Reviewing files that changed from the base of the PR and between 4651e72 and 3e58c2f.

📒 Files selected for processing (1)
  • plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py

Comment thread plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py`:
- Line 44: The function _format_result declares an unused parameter
filter_paths; remove filter_paths from the function signature and update any
call sites that pass filter_paths (the invocation that builds the outer wrapper
dict) to stop supplying it—ensure the wrapper still adds filter_paths separately
as before; only change _format_result's signature and caller arguments, leaving
the body and other returned fields untouched.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 054acbb3-523e-41a8-bd8a-45faf40acd8f

📥 Commits

Reviewing files that changed from the base of the PR and between 3e58c2f and 5e18ff3.

📒 Files selected for processing (1)
  • plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py

return [str((repo_root / p).resolve()) for p in filter_paths]


def _format_result(query, filter_paths, repo_path, index_info, results):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused filter_paths parameter.

The filter_paths parameter is declared but never referenced in the function body. At line 172, filter_paths is added to the outer wrapper dict separately, not via this function.

Proposed fix
-def _format_result(query, filter_paths, repo_path, index_info, results):
+def _format_result(query, repo_path, index_info, results):

And update the call site at line 171:

-        result = _format_result(query, filter_paths, repo_path, index_info, raw)
+        result = _format_result(query, repo_path, index_info, raw)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` at line 44,
The function _format_result declares an unused parameter filter_paths; remove
filter_paths from the function signature and update any call sites that pass
filter_paths (the invocation that builds the outer wrapper dict) to stop
supplying it—ensure the wrapper still adds filter_paths separately as before;
only change _format_result's signature and caller arguments, leaving the body
and other returned fields untouched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant